Hoist collapse shape out of scf.forall when possible and expand its destination #19044

nirvedhmeshram · 2024-11-06T16:59:30Z

This pattern allows to hoist out collapse shape producer of tensor.parallel_insert_slice and expand the enclosing scf.forall destination. This is only safe because we are doing this on workgroup mapped sc.forall's and hence the slices are disjoint. The reason to have this pattern is that it allows us to eliminate empty tensors during bufferization which would otherwise be blocked due to the collapse.

kuhar

Just some nits

compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp

Max191

So far looking good! I left some comments about some potential edge cases, and some more tests that would be good to have for the first round of comments.

compiler/src/iree/compiler/Codegen/Common/PropagateReshapesByExpansion.cpp

compiler/src/iree/compiler/Codegen/Common/test/propagate_reshapes_by_expansion.mlir

Max191

Looks good! The code looks a lot cleaner now too, nice work!

Max191 · 2024-11-12T14:55:54Z

Hmm, looks like punet fails to compile with this. I think it must be one of the conv dispatches (because nothing else uses this pass), but it seems a bit strange to me that this is causing a compilation failure.

nirvedhmeshram · 2024-11-12T15:33:25Z

Hmm, looks like punet fails to compile with this. I think it must be one of the conv dispatches (because nothing else uses this pass), but it seems a bit strange to me that this is causing a compilation failure.

Yeah thats a bummer, one thing I can try is to see if CI passes when I disable multi-result, if thats the case then the issue might be that the getTiedResult function assumes the multiple parallel_insert_slice are ordered in the same way as the results.

Max191 · 2024-11-12T15:52:28Z

I took a look at the model, and found the dispatch that is failing. Here's a gist with a dump: https://gist.github.com/Max191/a092ec5c81e5f9f44aee19358f007a34

Looks like the problem is that the collapse_shape was hoisted out of the forall, but then could not fold with the flow.dispatch.tensor.store at the function boundary, because the store has an offset (from pad fusion). I'm not quite sure what the best solution for this is at the moment.

EDIT: snippet from the dump after reshape propagation:

    } -> (tensor<1x2x8x16x4x16xf16>, tensor<1x2x8x16x4x16xi8>)
    scf.forall.in_parallel {
      tensor.parallel_insert_slice %27#0 into %arg3[%arg0, %arg1, 0, 0, %arg2, 0] [1, 2, 8, 16, 4, 16] [1, 1, 1, 1, 1, 1] : tensor<1x2x8x16x4x16xf16> into tensor<2x128x8x16x20x16xf16>
      tensor.parallel_insert_slice %27#1 into %arg4[%arg0, %arg1, 0, 0, %arg2, 0] [1, 2, 8, 16, 4, 16] [1, 1, 1, 1, 1, 1] : tensor<1x2x8x16x4x16xi8> into tensor<2x128x8x16x20x16xi8>
    }
  } {mapping = [#iree_codegen.workgroup_mapping<z>, #iree_codegen.workgroup_mapping<y>, #iree_codegen.workgroup_mapping<x>]}
  %collapsed = tensor.collapse_shape %20#1 [[0], [1], [2, 3], [4, 5]] : tensor<2x128x8x16x20x16xi8> into tensor<2x128x128x320xi8>
  flow.dispatch.tensor.store %collapsed, %7, offsets = [0, 1, 1, 0], sizes = [2, 128, 128, 320], strides = [1, 1, 1, 1] : tensor<2x128x128x320xi8> -> !flow.dispatch.tensor<readwrite:tensor<2x130x130x320xi8>>
  flow.dispatch.tensor.store %20#0, %8, offsets = [0, 0, 0, 0, 0, 0], sizes = [2, 128, 8, 16, 20, 16], strides = [1, 1, 1, 1, 1, 1] : tensor<2x128x8x16x20x16xf16> -> !flow.dispatch.tensor<writeonly:tensor<2x128x8x16x20x16xf16>>
  return
}

Max191 · 2024-11-12T15:57:11Z

One way to fix this would be to check that the consumer of the scf.forall is a flow.dispatch.tensor.store with a full slice (i.e., zero offsets, unit strides, and source_sizes == target_sizes). This is making the pattern even more specific, but it is already quite specific, so maybe it is okay for now.

nirvedhmeshram · 2024-11-12T16:27:32Z

One way to fix this would be to check that the consumer of the scf.forall is a flow.dispatch.tensor.store with a full slice (i.e., zero offsets, unit strides, and source_sizes == target_sizes). This is making the pattern even more specific, but it is already quite specific, so maybe it is okay for now.

Makes sense, I guess theoretically it should be possible to support this?

flow.dispatch.tensor.store %20#1, %7, offsets = [0, 1, 0, 1, 0, 0], sizes = [2, 128, 8, 16, 20, 16], strides = [1, 1, 1, 1, 1, 1] : tensor<2x128x8x16x20x16xf16> -> !flow.dispatch.tensor<writeonly:tensor<2x130x130x20x16xf16>

Max191 · 2024-11-12T17:26:37Z

Following up here from a discussion with Nirvedh offline. We are going with the solution I proposed here: #19044 (comment)

As long as the convolution has a consumer in the dispatch, then the pipeline should be able to handle not hoisting the collapse_shape when the dispatch.tensor.store has offsets (when a pad consumer has been fused). If there is no consumer, and the result of the convolution is directly stored with some offsets, then this will cause problems, but I don't think it is a case that is likely to be seen. There is typically an operation between convolutions and pads, and pad fusion does not happen by default anyway. It is a case to be aware of, but I think it is fine to proceed with this additional restriction for now.

In other words, the following case would be problematic if the pad is fused with the producer convolution:

%conv = linalg.conv_2d_nhwc_hwcf ...
%pad = tensor.pad %conv ...

However, this seems like an uncommon case, and pad fusion only happens with aggressive fusion anyway.

compiler/src/iree/compiler/Codegen/Utils/Utils.cpp

Signed-off-by: Nirvedh <[email protected]>

…or all users (#19139) The existing pattern added in #19044 created different offsets for each user even though we previously checked that the offsets will be exactly same. This was preventing recursive application of the pattern as the comparison of the offsets for the next application of patten would fail. The change in this PR is tested by removing cse in test file which was added by #19044 to workaround this exact issue. Signed-off-by: Nirvedh Meshram <[email protected]>

…estination (iree-org#19044) This pattern allows to hoist out collapse shape producer of tensor.parallel_insert_slice and expand the enclosing scf.forall destination. This is only safe because we are doing this on workgroup mapped sc.forall's and hence the slices are disjoint. The reason to have this pattern is that it allows us to eliminate empty tensors during bufferization which would otherwise be blocked due to the collapse. --------- Signed-off-by: Nirvedh <[email protected]>

…or all users (iree-org#19139) The existing pattern added in iree-org#19044 created different offsets for each user even though we previously checked that the offsets will be exactly same. This was preventing recursive application of the pattern as the comparison of the offsets for the next application of patten would fail. The change in this PR is tested by removing cse in test file which was added by iree-org#19044 to workaround this exact issue. Signed-off-by: Nirvedh Meshram <[email protected]>

…estination (iree-org#19044) This pattern allows to hoist out collapse shape producer of tensor.parallel_insert_slice and expand the enclosing scf.forall destination. This is only safe because we are doing this on workgroup mapped sc.forall's and hence the slices are disjoint. The reason to have this pattern is that it allows us to eliminate empty tensors during bufferization which would otherwise be blocked due to the collapse. --------- Signed-off-by: Nirvedh <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>

…or all users (iree-org#19139) The existing pattern added in iree-org#19044 created different offsets for each user even though we previously checked that the offsets will be exactly same. This was preventing recursive application of the pattern as the comparison of the offsets for the next application of patten would fail. The change in this PR is tested by removing cse in test file which was added by iree-org#19044 to workaround this exact issue. Signed-off-by: Nirvedh Meshram <[email protected]> Signed-off-by: Giacomo Serafini <[email protected]>

nirvedhmeshram requested a review from hanhanW as a code owner November 6, 2024 16:59

nirvedhmeshram requested review from Max191, MaheshRavishankar and qedawkins November 6, 2024 16:59

nirvedhmeshram force-pushed the hoist_collapse_pr branch 2 times, most recently from d58c313 to e44793b Compare November 6, 2024 17:04

nirvedhmeshram mentioned this pull request Nov 6, 2024

[GPU] Move tile and distribute pass before packing to intrinsic for TileAndfuse pipeline #19053

Merged

nirvedhmeshram changed the title ~~[GPU] Hoist collapse shape out of scf.forall when possible and expand its destination~~ Hoist collapse shape out of scf.forall when possible and expand its destination Nov 7, 2024

kuhar reviewed Nov 7, 2024

View reviewed changes

Max191 requested changes Nov 7, 2024

View reviewed changes

nirvedhmeshram requested a review from benvanik as a code owner November 11, 2024 22:13

nirvedhmeshram force-pushed the hoist_collapse_pr branch 2 times, most recently from 9523904 to 48d5691 Compare November 11, 2024 22:16

nirvedhmeshram removed the request for review from benvanik November 11, 2024 22:16

nirvedhmeshram force-pushed the hoist_collapse_pr branch from 48d5691 to 02a2945 Compare November 11, 2024 22:21

nirvedhmeshram requested a review from Max191 November 11, 2024 22:21

Max191 approved these changes Nov 12, 2024

View reviewed changes

nirvedhmeshram force-pushed the hoist_collapse_pr branch 2 times, most recently from fdf1b03 to ddcaeda Compare November 12, 2024 19:10

Max191 approved these changes Nov 12, 2024

View reviewed changes

compiler/src/iree/compiler/Codegen/Utils/Utils.cpp Outdated Show resolved Hide resolved

nirvedhmeshram force-pushed the hoist_collapse_pr branch from 900a1d3 to d11b745 Compare November 12, 2024 19:52

nirvedhmeshram added 3 commits November 13, 2024 01:44

[GPU] Hoist collapse shape out of scf.forall when possible

d895ccb

Signed-off-by: Nirvedh <[email protected]>

Intial review comments

a046571

Signed-off-by: Nirvedh <[email protected]>

More review comments

e773909

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram added 4 commits November 13, 2024 01:44

More review comments 2

4798075

Signed-off-by: Nirvedh <[email protected]>

review comments

00b4cac

Signed-off-by: Nirvedh <[email protected]>

dont do for non full slice

965957a

Signed-off-by: Nirvedh <[email protected]>

use upstream utility

8a3727c

Signed-off-by: Nirvedh <[email protected]>

nirvedhmeshram force-pushed the hoist_collapse_pr branch from d11b745 to 8a3727c Compare November 13, 2024 01:44

nirvedhmeshram merged commit f3c1467 into iree-org:main Nov 13, 2024
33 of 36 checks passed

nirvedhmeshram mentioned this pull request Nov 13, 2024

Extend hoist collapse out of scf.forall pattern to use same offsets for all users #19139

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hoist collapse shape out of scf.forall when possible and expand its destination #19044

Hoist collapse shape out of scf.forall when possible and expand its destination #19044

nirvedhmeshram commented Nov 6, 2024 •

edited

Loading

kuhar left a comment

Max191 left a comment

Max191 left a comment

Max191 commented Nov 12, 2024

nirvedhmeshram commented Nov 12, 2024

Max191 commented Nov 12, 2024 •

edited

Loading

Max191 commented Nov 12, 2024 •

edited

Loading

nirvedhmeshram commented Nov 12, 2024 •

edited

Loading

Max191 commented Nov 12, 2024

Hoist collapse shape out of scf.forall when possible and expand its destination #19044

Hoist collapse shape out of scf.forall when possible and expand its destination #19044

Conversation

nirvedhmeshram commented Nov 6, 2024 • edited Loading

kuhar left a comment

Choose a reason for hiding this comment

Max191 left a comment

Choose a reason for hiding this comment

Max191 left a comment

Choose a reason for hiding this comment

Max191 commented Nov 12, 2024

nirvedhmeshram commented Nov 12, 2024

Max191 commented Nov 12, 2024 • edited Loading

Max191 commented Nov 12, 2024 • edited Loading

nirvedhmeshram commented Nov 12, 2024 • edited Loading

Max191 commented Nov 12, 2024

nirvedhmeshram commented Nov 6, 2024 •

edited

Loading

Max191 commented Nov 12, 2024 •

edited

Loading

Max191 commented Nov 12, 2024 •

edited

Loading

nirvedhmeshram commented Nov 12, 2024 •

edited

Loading